perm filename SUMMAR[RDG,DBL] blob sn#663405 filedate 1982-06-19 generic text, type T, neo UTF8
<Summary of my Thesis Proposal>

	I am currently investigating how analogy may be used to
facilitate Knowledge Acquisition.
The goal is a module which will allow the domain expert to describe
a new domain object or set of objects, as analogous to some known object(s).
The remainder of this sketch walks through a simple example, to illustrate
the type of operations this analogizing module must be able to perform,
and where research time and effort must be channeled.

<<Meta note: please forgive the technical inaccuracies filling this short
  summary.  The goal is to present an idea of what the analogizer should
  be able to do, not to differentiate different forms of meningitus.>>

Imagine a medical expert who tells use that
(*)	"Viral-meningitus is like Bacterial-meningitus,
	 except it is caused by a virus."
He would probably expect us (and hence any sophisticated system) to use
this assertion to conjecture that this new item, 
viral meningitus,
	is a disease,
	which is caused by a virus,
	whose SYMPTOMs are very similar to those of bacterial meningitus,
		(perhaps even specifying in what ways it is alike)
	but whose etiology can be totally different,
	and whose treatment should be quite distinct.

We will outline below how the analogizer will be able to reach these
conclusions; and indicate the types of facts to which this program must have
access.

For this example to work, the existant MedicalKB must already know a lot
about bacterial meningitus.
In particular, it knew that bacterial meningitus was a certain kind of 
disease, with a particular etiology, set of symptoms and treatments.
Our goal is to derive facts and conjectures about viral meningitus, based on
those bacterial meningitus assertions.
(Stated another way, we creating a unit for Viral-meningitus by
"Copying and Editing" the Bacterial-meningitus unit.
This involves discerning which of the Bacterial-meningitus slots are
appropriate for this neo-natal Viral-meningitus unit;
and what values should fill these slots.)

The basic procedure is an incremental "generate and test".
Various "suggestive" heuristics will propose hypotheses about
Viral-meningitus.  Other rules will then examine these proposals, pruning
those which are contradictory (or just very unlikely).
Those conjectures which are accepted are recorded, 
together with a body of reinforcing reasons.
This propose and verify cycle will continue until no more generating rules
are triggered.  
It is important to note that this process will be interactive --
at any point a rule may call upon the user to answer some question which
is outside the expertise of the analogizer or the current MedicalKB.
(Later rules may declare some earlier assertion faulty, and cause it, and
all of its effects, to be thrown out.
This will require some form of Reason Maintanence System -- see [Doyle].)

Enough overhead; onto the actual conjectures.
How did we know that Viral-meningitus is a disease?
(Realize that stating that
	"This sponge is like meningitus,
	 in that both have an excess storage of water".
does not imply that "this sponge" is a disease.)
We don't really know that this is true --
it does seem a plausible conjecture, though,
as we do know that viruses cause diseases,
and (*) stated that V-m was caused by a virus.
Such converse statements should be viewed as reinforcements, not proofs.
(E.g., a virus can cause a contamination of an otherwise pure culture, or ...)

Now we ask why we even considered whether V-m was a disease.
This was done because of a heuristic:
	If A is asserted to be like B,
	 consider setting A:Isa to the same value as B:Isa.
(That is, see if this class membership is consistent with other 
facts and conjectures about A.)
[There are many arguments "justifying" this heuristic.
A major one is that this Isa classification is often based on the
obvious perceived similarities, etc.]

<<Issue - what is correct order of rules -- why check for Isa so early?
Again, common sense -- this is a good discriminant, and useful to know
ASAP.  This "is a particular kind of" relation is very important,
and should be carried over to the other analogy (here "viral meningitus") 
whenever possible, ASAP.
Given that it can be thrown out, this isn't much risk associated with
being wrong ...>>

Having established that these two analogues are both diseases,
we might consider what it means for two diseases to be considered similar.
In general this means they have similar symptoms,
and possibly similar causes.
Finding V-m has a similar name to B-m reinforces the conjecture that
they should share similar symptoms.
(afterall, these two diseases may have been named long before anyone understood
the organisms responsible.)
(This test is performed by a medical-domain-specific heuristic.)

We might then ask if the symptoms are identical?  This we don't know yet --
but can determine by asking the nearby domain expert.

The fact (given in the initial analogy) that these two diseases have different
causes mean their respective etiologies will be rather different --
although the similarity of the symptoms places some constraints on how
diverse the etiologies can be.
The ontogeny of the different orgranisms, likewise, must satisfy some
connecting constraints.
(Some "reasoning from first (medical) principles" will lead to clear 
differences between these two organisms.  Here, this is based on the 
distinction of bacteria and viruses; possibly augmented by answers given
by the DE.)

Finally, the desired treatments may be radically different -- clearly
anti-biotics (which may have some effect on the first bacterial disease)
will be useless for the second viral form.
[This could have been deduced from a general fact about etiology tells that
different causes imply different end results.]

It seems we have to have quite a bit of knowledge, about various things,
to make these obvious conclusions.  
To reach these "obvious" decisions we had to employ facts particular
to this medical domain,
(including pharmacological facts, knowledge of infections in general, 
[e.g. diseases are considered similar if they exhibit the same "behaviour"
(reinforced by naming convention)],) 
as well as facts about 
analogies in general, observations about salience, etc..
In addition, we needed to include many "common sense" notions
-- including a general understanding of causality
(e.g. Treatment should be related to Cause);
and a variety of general heuristics --
including the "Copy isa, whenever possible" rule.
How to encode these diverse facts is a major research challenge.

<<here>>
To a first approximation, one might think of this task as
an automated "copy and edit" process --
given one, well fleshed out analogue (here, representing the class of 
bacterial meningitus), we want to transfer certain facts onto the other
analogue -- viral meningitus.
It sounds quite simple at this level -- merely reading down the facts
associated with the vehicle, and asking how to modify each to apply to
the new topic.
(i.e. automatically copy it, as it stands; perform a simple character substitution
first; decide this property is not relevant to this new analogue and NOT copying it;
etc.)
While even this trivial task is unquestionably important, and time-saving,
this is only the "straight-forward" component of the a more exciting program(me).

Some comments about analogies in general.
First, notice that, any analogy is obvious in retrospect.
Once one has described how the analogues A and B are similar 
(i.e. in that they are both Ys) any feature which you find both A and B share
will be trivial -- of course P(x) holds for A and for B; it holds for all
members of Y!

The real challenge 
(and AI contribution of this work)
is in finding that common theory,
which explains both the body of (relevant) facts about A
-- those which should be applied to B.
(We will call that theory "an abstraction" of A.)

The analogizer will NOT be limited to finding just those connections already
available given the current representation.
Instead it will have the capability to produce new and different formulations
of the analogue(s); it is in this new representation that the similarities
will become obvious.

<here>
The actual set of features used for this transference may have to be
generated -- in particular, the expert is NOT limited to the particular
representation used up to here.
Consider connection between blood and lymph -- as both AreA circulating
bodily fluid, not surprising to find many correlations.
(still, even at this level, the expert could still save considerable time
NOT having to type in such tedious facts.)
Now consider connecting the lymph system with the neurological network.
Well... both may be loops for feedback, ...
Ahhh -- so both are instances of curcuits.  Had I known this earlier
I could simply have claimed that each Isa Circuit, and each would inherent
those facts common to all such curcuits.
As I mentioned, in aftersight such things are obvious.
the point is we might not have realized this beforehand!
This is the challenge -- to meaningfully find the commonality, even when
it is NOT explicit, but rather has to be teased out...
-----

Why useful?

More accurate:
Consider Tieresias -- it spared the expert the task of 

Faster:
a) More natural -
consider how experts communicate amongst themselves --
"this patient just lke that one", or "these chemicals are just lke those",
or "think of this as water flow", ...
b) more succinct -
	fewer symbols used to encode more stuff.

-----
For this reason I used the (more familiar to me) domain of programming
for my more detailed examples.
It's big drawback is its artificiality and artifactualness.

Hence I need some examples -- hence my wish to "connect up" with you.>>


Consider how analogies might be used by people wrt computer programs.

(1) Analyze/understand program A, based on knowledge about B, and connection
	of A to B.

(2) Generate new code, by analogy with existing code
	(a) Create a new program
	(b) Augment an existing program

(3) Modify existing code, in running/errorful program
[Note similar types of bugs, ... in different programs]

--------
Scenarios:
   o	I know about MAPCAR, and am told EVERY is similar, but ...
	   Deduce what EVERY does.
   o    Given spec for EVERY, copy and edit MAPCAR to perform this function.
	   (or MAPC, or MAPLIST, ...)
   o	Concoct the macro which is used to implement these mapping functions   
   o	Optimize code -- same I/O, but faster or less space (e.g. DMAPCAR)
   o	Given PR-STASH, generate PL-STASH, which works with Property lists,
	   rather than the hash table PR-STASH used.
   o	Figure how to write PL-UNSTASH, 
	   given relation of PR-STASH to PR-UNSTASH, & PL-STASH.
   o	Given MACLISP code for function F, rewrite F in InterLisp.
		or Pascal, or FORTRAN, SETL, APL, ...
   o	Change the data structure
	    -- e.g. deal with linked lists, not bit vectors
   o	Change the type of objects program X deals with 
	    (not just their representation)
	    Given UNION, and def'n of SET and LIST, 
		write (various flavors of) APPEND

-------
Why this domain -- of programming?
* one has "complete" information -- 
	hence if one representation doesn't work, can produce another one
	(starting from that "primitive" code) -- `reformulation'
* perhaps will be used to concoct a new representation -- eg programs
	which use certain tricks, or written by Fred (who likes capital
	letters)
	[is this exactly "learning by ostention"?]

-------
How can Program A be like Program B?
* wrt I/O behavior
   o (abstract) functionality -- e.g. both "sort"
   o Data structures involved -- eg linked list, not bit string
   o "interpretation of data" -- eg sets vs lists
   o Timing - both absolute CPU or real time, or assymptotic
* wrt structure
   o procedure calling sequences -- eg CAR recursive, ...
   o details of code -- eg variable names
	Nuances of coding style of
		Author, Author's school, ...
	    (e.g. DBL: Long variable names, many globalvars)
   o Language used for implementation
   o Abstract description of code
	"tricks used" -- i.e. version space, or path compression
	same task -- e.g. learning